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A pipelined framework is proposed for accurate, automated, simultaneous 
segmentation of the liver as well as the hepatic tumors from computed 
tomography (CT) images. The introduced framework composed of three 
pipelined levels. First, two different transfers deep convolutional neural 
networks (CNN) are applied to get high-level compact features of CT 


images. Second, a pixel-wise classifier is used to obtain two output- 


classified maps for each CNN model. Finally, a fusion neural network 
Keywords: (FNN) is used to integrate the two maps. Experimentations performed on the 
MICCAI’2017 database of the liver tumor segmentation (LITS) challenge, 


Computed tomograp hy result in a dice similarity coefficient (DSC) of 93.5% for the segmentation of 
Deep learning the liver and of 74.40% for the segmentation of the lesion, using a 5-fold 
Liver ; cross-validation scheme. Comparative results with the state-of-the-art 
Segmentation techniques on the same data show the competing performance of the 
Tumors proposed framework for simultaneous liver and tumor segmentation. 
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1. INTRODUCTION 

According to the World Health Organization (WHO), liver cancer is the main cause of cancer deaths 
among all types of cancers. Worldwide, around 800,000 cases of liver cancer are diagnosed each year, 
accounting for around 700,000 deaths [1]. In 2019, the American Cancer Society (ACS) estimated around 
42,030 new cases for primary liver cancer and intrahepatic bile duct cancer in the United States, with around 
31,780 deaths [1]. These metrics reflect the epidemic inflation of liver cancer. 

Computed tomography (CT) imaging is usually used for liver segmentation and/or liver cancer 
detection. However, manual segmentation of the liver and/or the liver tumors from CT images consumes a lot 
of time and suffers from observer variability. Therefore, the design of efficient computer aided diagnostic 
(CAD) systems, to assist the radiologists for liver segmentation and/or liver cancer segmentation, is a widely 
investigated open research problem. Throughout literature, different methodologies have been utilized for 
liver segmentation and/or for liver cancer segmentation. These methods can be categorized as traditional 
methods or deep learning methods. 

Traditional approaches usually extract features, e.g., intensity, texture, shape, from liver CT images 
and use a classifier based on these features to perform the segmentation process. On the other hand, deep 
learning methods usually use convolutional neural networks (CNN) that consist of a number of convolutional 
layers for extracting low-level and high-level features for the liver CT images and fully connected layers to 
encode a compact feature set for the segmentation process. 
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For the task of liver segmentation, a preliminary step in many CAD systems for liver cancer [2] and 
liver fibrosis [3], different traditional and deep learning methods have been applied. For example, Barstugan et al. [4] 
used a super-pixel linear iterative clustering approach and AdaBoost algorithm to segment the liver, 
achieving a DSC of 92.13% on 16 abdomen CT test images. Muthuswamy and Kanmani [5] extracted the 
liver from CT images based on intensity thresholding, fuzzy c-means clustering, and connected component 
analysis. For example, Chang et al. [6] segmented the tumors using a region growing algorithm. A binary 
logistic regression analysis based on extracted texture, shape, and kinetic curve features were further 
performed to classify the segmented tumors (Benign or Malignant). In Chlebus et al. [7], a modified U-ne [8] 
architecture was used, consists of four resolution levels, for liver tumors segmentation. Yuan [9] used a 
hierarchical deep fully convolutional-deconvolutional neural networks (CDNN) for tumor segmentation. An 
initial liver segmentation was provided using a simple CDNN model. The segmented liver region was refined 
using another CDNN to find the final liver segmentation enhanced by histogram equalization. Then a third 
CDNN is applied for tumor segmentation. Bi et al. [10] used a deep residual networks (ResNet) for liver and 
lesions segmentation. Gruber et al. [11] applied, sequentially, two U-net [8] networks for liver and lesions 
segmentation. Wang et al. [12] a 3D atlas-based model for liver segmentation. Shi et al. [13] utilized a 
deformable shape liver segmentation method. Song ef al. [14] implemented a modified U-Net model for liver 
segmentation. Although the methods presented in the literature achieved good results, the accuracy is still a 
need to be improved. The present study presents a deep learning system for simultaneous liver and tumor 
segmentation using CNN modeling. The main contributions of this work are as follows: 
— Investigating different deep learning architectures for liver and tumor segmentation (i.e., Densenet and 

FCN-AlexNet) 

— Applying a 3D narrow-band of the input images to enhance the deep training 
— Using a smart fusion of two CNN architectures to improve the segmentation quality 
— Performance evaluation on the MICCAI’2017 challenge liver tumor segmentation (LITS) database. 

The structure of this paper is as following. Section 2 presents the suggested system for 
simultaneous liver and tumor segmentation. Section 3 summarizes the proposed system results as well as the 
comparative results to the current state-of-the-art techniques. Finally, section 4 concludes the paper. 


2. METHODS 

The proposed framework processes a raw image through three stages as shown in Figure 1. First, 
features are extracted from raw images, without preprocessing steps, by investigating two different CNN 
models. Second, a pixel-wise classification layer is applied. Finally, a smart fusion of the outputs of the two 
CNN models is performed using a neural network (NN) to provide the final simultaneous liver and tumor 
segmentation map, containing three output labels: background (BG), liver, and lesion. 


Input image Step 1: Step 2: Step 3: 
P 8 Feature Extraction Classification Fusion 


Final output 
o/p: model 1 


Two CNN Models 
Neural Network 


= 
512 x 512 Lesion BG Liver 


Liver CT images o/p: model 2 


Figure 1. Proposed framework for the liver and lesions segmentation with three stages: feature extraction 
using deep learning, classification based on pixel-wise technique, and smart fusion 


2.1. Feature extraction 

Herein, two pre-trained CNNs are used to get the features of the liver and its lesions; Densenet [15] 
and the fully connected network (FCN) using Alexnet (FCN-Alexnet [16]). The Densenet model consists of a 
down-sampling path and up-sampling path. The down-sampling path extracted the semantic features then the 
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up-sampling path is trained to recover the image resolution of the input at the output of the model. Figure 2 
shows the architecture of the Densenet model. 


Dense Block 1 Dense Block 2 Dense Block 3 
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Figure 2. Schematic diagrams for the structures of Densnet model [15] 
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FCN-Alexnet consists of an encoder, a decoder, and a pixel-wise classifier as shown in Figure 3. 
The job of the encoder is to extract high-level compact deep learning features from the abdomen liver CT 
images. For the FCN-Alexnet model, the stage of the encoder consists of five Alexnet’s layers as shown in 
Table 1, with no fully connected layers as shown in Figure 3. The job of the decoder is to perform 
deconvolutional steps to get extracted features with the same dimensions as the input image. To perform the 
segmentation process, a classification layer based on pixel-wise technique is further used, to label each pixel 
in the CT input image into one of three labels: lesion, liver, or background. 


Encoder: 
#5 Conv. and #2 FC Layers, Alex model 


l 


Decoder: 
#1 deconv. Layer, Alex 


| 


Pixel-wise Classifier 


Figure 3. Schematic diagrams for the structures of FCN-Alexnet [16] 


The pre-trained Densenet and Alexnet are trained on the ImageNet large-scale visual recognition 
challenge 2012 (LSVRC2012) dataset that is composed of 10 million training images of size 224x224 from 
more than 1000 subjects. The Densenet network composed of contiguous dense blocks. There are a transition 
layers (convolutional layers and average pooling) between the contiguous dense blocks with more than 
20 million parameters. The size of the feature map and the dense block is similar to be concatenated easily. 
There are a global average pooling and a SoftMax classifier at the end of the last dense block. Alexnet model 
is composed of five convolutional layers and three FC layers with more than 62.4 million parameters. More 
detailed of each model can be found in [15], [16], respectively. We applied the two models (Densenet and 
FCN-Alexnet) in the proposed system, since their decoders produce outputs that are of the same dimensions 
as the input image, which suits the task of segmentation. In addition, they have shown outstanding 
performance for several related medical applications, such as lung segmentation [17], [18], pulmonary 
cancerous detection [19], face recognition [20], brain cancer [21] and diabetic retinopathy [22]. 


2.2. Classification 

A pixel-wise classifier is applied after each model’s decoder to label the segmented output image. 
The pixel-wise classifier is composed of two layers: a SoftMax layer and a weighted layer to perform pixel- 
wise classification. The SoftMax layer is composed of three SoftMax nodes per each image pixel, providing 
the probabilities of the three labels: lesion, liver, or background, as in (1). 


x 


o(xi) = sox (1) 
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where x; denotes the input at the softmax node i and o(.) denotes the output probability of the SoftMax node. 
The weights of the pixel classification layer are trained using the LITS database. Based on the largest 
SoftMax probability, the pixel-wise classification layer provides the final output label for each pixel to be 
either lesion, liver, or background. 


2.3. Fusion neural network (FNN) 

To investigate the potential of fusing the extracted deep learning features from the two utilized deep 
learning models (Densenet and FCN-Alexnet), an FNN is designed to integrate the strength of each model. 
The proposed FNN consists of an input layer, one fully connected hidden layer, and an output layer as shown 
in Figure 4. The input layer of the FNN consists of the two input labeled images (from the outputs of the 
Densenet and the FCN-Alexnet models). The hidden layer is composed of a number of H1 nodes, H1=100, 
selected during experimentations, all with tanh activation functions. The output layer is composed of the 
finally fused output labeled image with the same dimensions as the input images. Figure 4 shows a typical 
example of fusion, where the proposed FNN was able to enhance the performance of the given example. 


Model 1, Output 


Proposed Fusion Neural Network (FNN) 


= 
i | Hidden Layer | | Output | 
DSC, liver=72% W — Final Output | 


DSC, tumor=62% 


i 2 E> 


DSC, liver= 96% 
DSC, tumor=72% 


DSC, liver= 92% 


DSC, tumor=70% < d 


Figure 4. Architecture of the proposed FNN 


2.4. Performance metrics 

In order to accurately evaluate the performance of the proposed system for the liver and tumor 
segmentation, two parameters are used to assess the quality of segmentation: one area-based metric; the dice 
similarity coefficient (DSC), and a distance-based metric; the average symmetric surface distance (ASSD). 
The DSC [23] represents the area overlap between the segmented image (S) and the ground truth (GT) 
image: 


DSC (S, GT) = |S N GTV 0.5(|S| + IGT) x 100 % (2) 


where the |. | operator denotes the object area. 

On the other hand, the ASSD [24] measures the distance between the segmented object surface and 
its corresponding GT segmentation surface, known as the average of the Euclidian distances, d, from (i) all 
points, x, on the surface of the segmented object (Ss) to the surface of the GT (GT's) and (ii) all points on the 
GTs to Ss: 


ASSD (S, GT) =1/ (\Ss| + |GTs|) x (Xyess(x, GTs) + Yxecrs (x, Ss)) x 100 % (3) 


3. EXPERIMENTAL RESULTS AND DISCUSSION 
In this section, the LITS challenging database, the experimental setup, and the comparative results 
to other methods are detailed. 


3.1. LITS database 

The LITS challenging database [25], [26] consists of 130 contrast-enhanced abdominal CT training 
scans collected from seven different clinical institutions. The training CT scans were given with manual 
segmentations of the liver and liver lesions done by trained radiologists. All volumes contained a different 
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number of axial slices (42 to 1026 cross-section per volume), with an overall number of 16,917 images. The 
size of each CT image is 512x512 pixels. Data description is detailed in [25] and [26]. 


3.2. Experimental setting 

Model 1 (Densenet) and Model 2 (FCN-Alexnet) are trained using the database of LITS 
competition as follows: initially, all the encoder’s weights are initialized by transferring the Densenet 
network in [15] and Alexnet in [16] pertained weights, respectively. In the training phase, all encoder layers 
and decoder layers are fine-tuned using the LITS data. The training epochs are repeated until the cross- 
entropy loss is very small or the number of epochs exceeds 30. Inputs are shuffled in each epoch using a 
mini-patch size of 500. Learning rates are set to 10° for model 1 and for model 2 to afford higher parameter 
tuning. FNN training applied the same training setting. All training phases are implemented using 
MATLAB®© 2018a. Over-fitting is avoided by reducing the network's capacity by removing layers (fully 
connected layer in the pre-trained network Alexnet in model 2) and reducing the number of elements in the 
hidden layers in the fusion network. 

A Five-fold cross-validation is used to evaluate the proposed system. Two modes are used for the 
input data: “duplicate” and “3D narrow-band” as shown in Figure 5. In the “Duplicate” mode, the input data 
is composed of three duplicated grey level images at each of the three standard channels of the utilized deep 
learning model. In the “3D narrow-band” mode, input data is composed of three consequent anatomical grey 
level images to the proposed system (i.e., the target image to the centralized CNN model’s input channel and 


the previous and next cross-sections to each side channel). 
(b) 


Figure 5. The data are input to the proposed system using two modes: (a) “Duplicate” and (b) “3D Narrow- 
band”. Original image (top row) and GT image (bottom row) 


Original 
Original 


GT 
GT 


(a) 


A five-fold cross-validation is applied to evaluate the proposed system with two different settings: 
“global” and “per case”. The “global” setting applies the 5-fold cross validation on the whole 16,917 images 
of all the 130 scans (i.e., 383 test images (20% of images) and 13,534 training images (80% of the images)). 
On the other side, the “per case” setting divide the data based on case (subject or scan) and applies the 5-fold 
cross-validation on the total number of 130 separate scans (i.e., 26 test subjects’ images (20% of the scans) 
and 104 training subjects (80% of the scans)). Cross-entropy is used as the objective function to train the 
network using ADAM optimizer [27]. The median frequency balancing is used, where the weight assigned to 
a class in the loss function. 


3.3. Experimental results 

In order to assess quantitatively the system performance, Table | provides detailed liver and tumor 
segmentation results for each utilized CNN model (Densenet and FCN- Alexnet) as well as the proposed 
fused system. Consistent with the visual results in Figure 6, the performance of FCN-Alexnet model is better 
than the Densenet network. This is due to the efficient simpler structure of the FCN-Alexnet (its encoder 
contains only five convolutional layers plus 2 fully connected layers, which is easy to be trained efficiently) 
compared to the Densenet (contains 201 layers [15], making its training rather complex and causes 
overfitting). In addition, Tables 1, 2, and Figure 6 show that the proposed FNN fusion further improves the 
performance. As expected, the “3D Narrow-band” mode achieves better results than the “Duplicate” mode, 
since it takes into account an extended 3D narrow-band anatomical information of the object. However, 
Table 1 shows that while the “3D Narrow-band” mode achieves better results for tumor segmentation for all 
the three compared systems (Densenet, FCN-Alexnet, and the proposed system), it fails to enhance the liver 
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segmentation results. This is due to the almost no significant change between the liver anatomies for the 
consequent images, while tumor anatomy shows significant changes due to its relatively small size compared 
with the liver. 


Table 1. Liver and tumor segmentation results for each utilized deep learning model (Densenet, FCN- 
Alexnet, and FNN). For each model, results are compared for two modes 
(“Duplicate” and “3D Narrow-band”) 


Model Object Liver Tumor 
Mode “Global” “Per case” “Global” “Per case” 

Metric DSC ASSD DSC ASSD DSC ASSD DSC ASSD 
Model 1 Duplicate 82.8% 3.89 76.5% 4.95 69.7% 3.87 62.6% 4.12 
Densenet Narrow-band 3D 82.8% 3.89 76.5% 4.95 73.0% 2.76 64.8% 3.07 
Model 2 Duplicate 96.9% 0.89 91.4% 1.32 76.3% 3.21 66.1% 3.87 
FCN Alexnet Narrow-band 3D 96.9% 0.85 91.4% 1.35 78.2% 2.43 68.9% 3.18 
Proposed: Duplicate 97.2% 0.74 93.5% 0.99 78.8% 2.36 70.0% 3.11 
FNN fusion Narrow-band 3D 97.2% 0.72 93.5% 0.77 79.9% 0.92 74.4% 0.99 


Input Model 1 Model 2 


Original Test Image Dice, liver =82% Dice, liver =91% Dice, liver =93% Dice, liver =100% 


Ground truth (GT) Dice, tumor =69% Dice, tumor =72% Dice, tumor =74% Dice, tumor =100% 


Figure 6. A “3D Narrow-band” sample segmentation results. First column contains the input and GT 
segmentation. Second, third, Forth and last columns provides the results of Model 1, Model 2, proposed FFN, 
and GT segmentation, respectively; liver (first row) and the tumor (second row) 


3.4. Comparative results 

Results are compared to the related state-of-the-art methods on the LITS competition database to 
quantify the proposed system strength as shown in Table 2. The proposed FNN fusion system achieves 
superior performance for tumor segmentation, evidenced by the highest “per case” DSC and the smallest “per 
case” ASSD among all the compared methods. However, the liver segmentation results are less than the 
related models. The clinical importance of the accurate liver segmentation is less important than the accurate 
tumor segmentation, e.g., when considering the case of assisting the radiologists in liver cancer cases. Later, 
an investigation of how to increase the performance will be introduced, especially for the liver segmentation. 


Table 2. Comparative results between the proposed system and the related state-of-the-art methods using the 
same database, consisting of 130 scans 


Paper Experimental setup Method “Per case” DSC 
Liver Tumor 
Bi et al. [10] Train size=118 Cascaded ResNet 95.1% 50.1% 
Test size=13 (Multi-scale Fusion) 

Elmenabawy et al. [28] 4-fold validation Train FCN-Alexnet with 90.4% 62.4% 
size=97 Test size=33 preprocessing 

Proposed framework 5-fold validation Fusing Densenet and 93.5% 74.40% 
Train size=26 FCN-Alexnet 


Test size=104 
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4. CONCLUSION 

In this paper, a CAD system for simultaneous liver and tumor segmentation is presented, based on 
the efficient fusion of two deep learning CNN models, trained using 3D narrow-band data. The system 
performance is evaluated on the challenging LITS database, achieving superior performance over competing 
methods for liver tumor segmentation. In the future, different CNN architectures as well as different fusion 
models will be investigated to improve the segmentation accuracy. 
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